skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Maiti, Anindita"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Deep generative models have become ubiquitous due to their ability to learn and sample from complex distributions. Despite the proliferation of various frameworks, the relationships among these models remain largely unexplored, a gap that hinders the development of a unified theory of AI learning. In this work, we address two central challenges: clarifying the connections between different deep generative models and deepening our understanding of their learning mechanisms. We focus on Restricted Boltzmann Machines (RBMs), a class of generative models known for their universal approximation capabilities for discrete distributions. By introducing a reciprocal space formulation for RBMs, we reveal a connection between these models, diffusion processes, and systems of coupled bosons. Our analysis shows that at initialization, the RBM operates at a saddle point, where the local curvature is determined by the singular values of the weight matrix, whose distribution follows the Marc̆enko-Pastur law and exhibits rotational symmetry. During training, this rotational symmetry is broken due to hierarchical learning, where different degrees of freedom progressively capture features at multiple levels of abstraction. This leads to a symmetry breaking in the energy landscape, reminiscent of Landau’s theory. This symmetry breaking in the energy landscape is characterized by the singular values and the weight matrix eigenvector matrix. We derive the corresponding free energy in a mean-field approximation. We show that in the limit of infinite size RBM, the reciprocal variables are Gaussian distributed. Our findings indicate that in this regime, there will be some modes for which the diffusion process will not converge to the Boltzmann distribution. To illustrate our results, we trained replicas of RBMs with different hidden layer sizes using the MNIST dataset. Our findings not only bridge the gap between disparate generative frameworks but also shed light on the fundamental processes underpinning learning in deep generative models. 
    more » « less
    Free, publicly-accessible full text available August 12, 2026
  2. Transformers have a remarkable ability to learn and execute tasks based on examples provided within the input itself, without explicit prior training. It has been argued that this capability, known as in-context learning (ICL), is a cornerstone of Transformers’ success, yet questions about the necessary sample complexity, pretraining task diversity, and context length for successful ICL remain unresolved. Here, we provide a precise answer to these questions in an exactly solvable model of ICL of a linear regression task by linear attention. We derive sharp asymptotics for the learning curve in a phenomenologically rich scaling regime where the token dimension is taken to infinity; the context length and pretraining task diversity scale proportionally with the token dimension; and the number of pretraining examples scales quadratically. We demonstrate a double-descent learning curve with increasing pretraining examples, and uncover a phase transition in the model’s behavior between low and high task diversity regimes: in the low diversity regime, the model tends toward memorization of training tasks, whereas in the high diversity regime, it achieves genuine ICL and generalization beyond the scope of pretrained tasks. These theoretical insights are empirically validated through experiments with both linear attention and full nonlinear Transformer architectures. 
    more » « less
  3. Abstract Both the path integral measure in field theory (FT) and ensembles of neural networks (NN) describe distributions over functions. When the central limit theorem can be applied in the infinite-width (infinite-N) limit, the ensemble of networks corresponds to a free FT. Although an expansion in 1 / N corresponds to interactions in the FT, others, such as in a small breaking of the statistical independence of network parameters, can also lead to interacting theories. These other expansions can be advantageous over the 1 / N -expansion, for example by improved behavior with respect to the universal approximation theorem. Given the connected correlators of a FT, one can systematically reconstruct the action order-by-order in the expansion parameter, using a new Feynman diagram prescription whose vertices are the connected correlators. This method is motivated by the Edgeworth expansion and allows one to derive actions for NN FT. Conversely, the correspondence allows one to engineer architectures realizing a given FT by representing action deformations as deformations of NN parameter densities. As an example,φ4theory is realized as an infinite-NNN FT. 
    more » « less
  4. null (Ed.)
  5. null (Ed.)
    A bstract Dark Yang-Mills sectors, which are ubiquitous in the string landscape, may be reheated above their critical temperature and subsequently go through a confining first-order phase transition that produces stochastic gravitational waves in the early universe. Taking into account constraints from lattice and from Yang-Mills (center and Weyl) symmetries, we use a phenomenological model to construct an effective potential of the semi quark-gluon plasma phase, from which we compute the gravitational wave signal produced during confinement for numerous gauge groups. The signal is maximized when the dark sector dominates the energy density of the universe at the time of the phase transition. In that case, we find that it is within reach of the next-to-next generation of experiments (BBO, DECIGO) for a range of dark confinement scales near the weak scale. 
    more » « less